-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix/duplicate columns #115
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks much! Will it be released soon?
{% for col in columns if col.name not in exclude %} | ||
{%- for dupe in columns if col.name[prefix|length:]|lower == dupe.name|lower -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This nested loop thing is O(n^2) right? I'm still learning Jinja, but seems it also supports dictionaries https://documentation.bloomreach.com/engagement/docs/datastructures#dictionaries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@greg-finley correct it does result in n^2 and jinja does support dictionaries. How are you proposing we leverage a dictionary to replace the nested loops?
It may be worthwhile to move forward with this solution in the immediate so users can resolve the error and consider optimizing the macro down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, 100%, let's ship it.
If we had the list of columns in a dictionary or set, we could O(1) look up whether the duplicate name exists vs looking through the column list again.
I am very close to releasing this! Likely Monday morning at this point. I heard from our product team that these additional fields (the non |
From looking at my own data, they seem to be all nulls, so I think removing or coalescing would have the same effect (though I guess slightly more efficient to avoid the coalesce) |
PR Overview
This PR will address the following Issue/Feature: dbt_hubspot Issue 119
This PR will result in the following new package version:
v0.12.0
This will technically be a breaking change since it will remove (via a coalesce) existing impacted fields if duplicates are identified. As such, I would feel more comfortable with this being a breaking change so customers are aware of the upgrade being applied.
Please detail what change(s) this PR introduces and any additional information that should be known during the review of this PR:
🚨 Breaking Changes 🚨
property_hs_
prefix from the source columns in the staging models. If a column with the prefix removed matches the same name as an existing column (for exampleproperty_hs_meeting_outcome
andmeeting_outcome
are both fields in the source table), then the new macro will coalesce the fields giving preference to theproperty_hs_
field as this is likely the most relevant field per the latest HubSpot API upgrade.stg_hubspot__engagement_call
stg_hubspot__engagement_company
stg_hubspot__engagement_contact
stg_hubspot__engagement_deal
stg_hubspot__engagement_email
stg_hubspot__engagement_meeting
stg_hubspot__engagement_note
stg_hubspot__engagement_task
stg_hubspot__ticket
stg_hubspot__ticket_company
stg_hubspot__ticket_contact
stg_hubspot__ticket_deal
stg_hubspot__ticket_engagement
stg_hubspot__ticket_property_history
Feature Updates
remove_duplicate_and_prefix_from_columns
has been included which expands off thefivetran_utils.remove_prefix_columns
macro by removing any duplicate columns that result from the prefix removal.PR Checklist
Basic Validation
Please acknowledge that you have successfully performed the following commands locally:
Before marking this PR as "ready for review" the following have been applied:
Detailed Validation
Please acknowledge that the following validation checks have been performed prior to marking this PR as "ready for review":
These steps were validated by recreating the issue with the seed data for our integration tests and also via validation with the customer that the fix does in fact resolve the error they are seeing.
Standard Updates
Please acknowledge that your PR contains the following standard updates:
dbt Docs
Please acknowledge that after the above were all completed the below were applied to your branch:
If you had to summarize this PR in an emoji, which would it be?
2️⃣